Detecting Polysemy in Hard and Soft Cluster Analyses of German Preposition Vector Spaces
نویسندگان
چکیده
This paper presents a methodology to identify polysemous German prepositions by exploring their vector spatial properties. We apply two cluster evaluation metrics (the Silhouette Value (Kaufman and Rousseeuw, 1990) and a fuzzy version of the V-Measure (Rosenberg and Hirschberg, 2007)) as well as various correlations, to exploit hard vs. soft cluster analyses based on Self-Organising Maps. Our main hypothesis is that polysemous prepositions are outliers, and thus represent either (i) singletons or (ii) marginals of the clusters within a cluster analysis. Our analyses demonstrate that (a) in a subset of the clusterings, singletons have a tendency to contain polysemous prepositions; and (b) misclassification and cluster membership rate exhibit a moderate correlation with ambiguity rate.
منابع مشابه
Automatic Semantic Classification of German Preposition Types: Comparing Hard and Soft Clustering Approaches across Features
This paper addresses an automatic classification of preposition types in German, comparing hard and soft clustering approaches and various windowand syntax-based co-occurrence features. We show that (i) the semantically most salient preposition features (i.e., subcategorised nouns) are the most successful, and that (ii) soft clustering approaches are required for the task but reveal quite diffe...
متن کاملA Rank-based Distance Measure to Detect Polysemy and to Determine Salient Vector-Space Features for German Prepositions
This paper addresses vector space models of prepositions, a notoriously ambiguous word class. We propose a rank-based distance measure to explore the vector-spatial properties of the ambiguous objects, focusing on two research tasks: (i) to distinguish polysemous from monosemous prepositions in vector space; and (ii) to determine salient vector-space features for a classification of preposition...
متن کاملAn Annotation Schema for Preposition Senses in German
Prepositions are highly polysemous. Yet, little effort has been spent to develop languagespecific annotation schemata for preposition senses to systematically represent and analyze the polysemy of prepositions in large corpora. In this paper, we present an annotation schema for preposition senses in German. The annotation schema includes a hierarchical taxonomy and also allows multiple annotati...
متن کاملFauna and frequency of hard ticks of livestock in South Khorasan province in 2018: Short Communication
Identification of hard tick species and their hosts are essential for the development of control and prevention programs for tick-borne diseases. In this descriptive cross-sectional study, ticks were collected from the sheep, goat, and camel in different regions of South Khorasan province, Iran in 2018 through cluster sampling method. Fauna and frequency of ticks were recorded and analyzed in S...
متن کاملVerb polysemy and frequency effects in thematic fit modeling
While several data sets for evaluating thematic fit of verb-role-filler triples exist, they do not control for verb polysemy. Thus, it is unclear how verb polysemy affects human ratings of thematic fit and how best to model that. We present a new dataset of human ratings on high vs. low-polysemy verbs matched for verb frequency, together with high vs. low-frequency and well-fitting vs. poorly-f...
متن کامل